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Abstract 

Higher-order tensor decompositions are 
analogous to the familiar Singular Value De- 
composition (SVD), but they transcend the 
limitations of matrices (second-order ten- 
sors). SVD is a powerful tool that has 
achieved impressive results in information 
retrieval, collaborative filtering, computa- 
tional linguistics, computational vision, and 
other fields. However, SVD is limited to 
two-dimensional an^ays of data (two modes), 
and many potential applications have three 
or more modes, which require higher-order 
tensor decompositions. This paper evalu- 
ates four algorithms for higher-order ten- 
sor decomposition: Higher-Order Singular 
Value Decomposition (HO-SVD), Higher- 
Order Orthogonal Iteration (HOOI), Shce 
Projection (SP), and Multislice Projection 
(MP). We measure the time (elapsed run 
time), space (RAM and disk space require- 
ments), and fit (tensor reconstruction accu- 
racy) of the four algorithms, under a vari- 
ety of conditions. We find that standaid im- 
plementations of HO-SVD and HOOI do not 
scale up to larger tensors, due to increasing 
RAM requirements. We recommend HOOI 
for tensors that are small enough for the 
available RAM and MP for larger tensors. 

1 Introduction 

Singular Value Decomposition (SVD) is growing 
increasingly popular as a tool for the analysis of 
two-dimensional arrays of data, due to its success 
in a wide variety of applications, such as informa- 
tion retrieval (Deerwester et al., 1990), collabora- 
tive filtering (Billsus and Pazzani, 1998), compu- 
tational hnguistics (Schiitze, 1998), computational 



vision (Brand, 2002), and genomics (Alter et al., 
2000). SVD is limited to two-dimensional arrays 
(matrices or second-order tensors), but many appli- 
cations require higher-dimensional arrays, known as 
higher-order tensors. 

There are several higher-order tensor decompo- 
sitions, analogous to SVD, that are able to cap- 
ture higher-order structure that cannot be modeled 
with two dimensions (two modes). Higher-order 
generalizations of SVD include Higher-Order Sin- 
gular Value Decomposition (HO-SVD) (De Lath- 
auwer et al., 2000a), Tucker decomposition (Tucker, 
1966), and PARAFAC (parallel factor analysis) 
(Harshman, 1970), which is also known as CAN- 
DECOMP (canonical decomposition) (Carroll and 
Chang, 1970). 

Higher-order tensors quickly become unwieldy. 
The number of elements in a matrix increases 
quadratically, as the product of the number of rows 
and columns, but the number of elements in a third- 
order tensor increases cubically, as a product of the 
number of rows, columns, and tubes. Thus there is 
a need for tensor decomposition algorithms that can 
handle large tensors. 

In this paper, we evaluate four algorithms 
for higher-order tensor decomposition: Higher- 
Order Singular Value Decomposition (HO-SVD) 
(De Lathauwer et al., 2000a), Higher-Order Orthog- 
onal Iteration (HOOI) (De Lathauwer et al., 2000b), 
Shce Projection (SP) (Wang and Ahuja, 2005), and 
Multislice Projection (MP) (introduced here). Our 
main concern is the ability of the four algorithms to 
scale up to large tensors. 

In Section 2, we motivate this work by listing 
some of the applications for higher-order tensors. In 
any field where SVD has been useful, there is likely 
to be a third or fourth mode that has been ignored, 
because SVD only handles two modes. 

The tensor notation we use in this paper is pre- 



sented in Section 3. We follow the notational con- 
ventions of Kolda (2006). 

Section 4 presents the four algorithms, HO-SVD, 
HOOI, SP, and MP. For HO-SVD and HOOI, we 
used the implementations given in the MATLAB 
Tensor Toolbox (Bader and Kolda, 2007a; Bader and 
Kolda, 2007b). For SP and MP, we created our own 
MATLAB implementations. Our implementation of 
MP for third-order tensors is given in the Appendix. 

Section 5 presents our empirical evaluation of the 
four tensor decomposition algorithms. In the ex- 
periments, we measure the time (elapsed run time), 
space (RAM and disk space requirements), and fit 
(tensor reconstruction accuracy) of the four algo- 
rithms, under a variety of conditions. 

The first group of experiments looks at how the 
algorithms scale as the input tensors grow increas- 
ingly larger. We test the algorithms with random 
sparse third-order tensors as input. HO-SVD and 
HOOI exceed the available RAM when given larger 
tensors as input, but SP and MP are able to process 
large tensors with low RAM usage and good speed. 
HOOI provides the best fit, followed by MP, then SP, 
and lastly HO-SVD. 

The second group of experiments examines the 
sensitivity of the fit to the balance in the ratios of 
the core sizes (defined in Section 3). The algorithms 
are tested with random sparse third-order tensors as 
input. In general, the fit of the four algorithms fol- 
lows the same pattern as in the first group of exper- 
iments (HOOI gives the best fit, then MP, SP, and 
HO-SVD), but we observe that SP is particularly 
sensitive to unbalanced ratios of the core sizes. 

The third group explores the fit with varying ratios 
between the size of the input tensor and the size of 
the core tensor. For this group, we move from third- 
order tensors to fourth-order tensors. The algorithms 
are tested with random fourth-order tensors, with the 
input tensor size fixed while the core sizes vary. The 
fit of the algorithms follows the same pattern as in 
the previous two groups of experiments, in spite of 
the move to fourth-order tensors. 

The final group measures the performance with 
a real (nonrandom) tensor that was generated for a 
task in computational linguistics. The fit follows the 
same pattern as in the previous three groups of ex- 
periments. Furthermore, the differences in fit are re- 
flected in the performance on the given task. This 
experiment validates the use of random tensors in 
the previous three groups of experiments. 

We conclude in Section 6. There are tradeoffs 
in time, space, and fit for the four algorithms, such 



that there is no absolute winner among the four algo- 
rithms. The choice will depend on the time, space, 
and fit requirements of the given application. If good 
fit is the primary concern, we recommend HOOI for 
smaller tensors that can fit in the available RAM, and 
MP for larger tensors. 

2 Applications 

A good survey of applications for tensor decompo- 
sitions for data analysis is Acar and Yener (2007), 
which lists several applications, including electroen- 
cephalogram (EEG) data analysis, spectral analysis 
of chemical mixtures, computer vision, and social 
network analysis. Kolda (2006) also lists various 
applications, such as psychometrics, image analysis, 
graph analysis, and signal processing. 

We believe that a natural place to look for appli- 
cations for tensor decompositions is wherever SVD 
has proven useful. We have grown accustomed to 
thinking of data in terms of two-dimensional tables 
and matrices; in terms of what we can handle with 
SVD. However, real applications often have many 
more modes, which we have been ignoring. 

In information retrieval (Deerwester et al., 1990), 
SVD is typically applied to a term x document ma- 
trix, where each row represents a word and each col- 
umn represents a document in the collection. An 
element in the matrix is a weight that represents the 
importance of the given word in the given document. 
SVD smoothes the weights, so that a document d 
will have a nonzero weight for a word u; if c? is simi- 
lar to other documents that contain the word w, even 
if d does not contain actually contain w. Thus a 
search for w will return the document d, thanks to 
the smoothing effect of SVD. 

To extend the term-document matrix to a third- 
order tensor, it would be natural to add information 
such as author, date of publication, citations, and 
venue (e.g., the name of the conference or journal). 
For example, Dunlavy et al. (2006) used a tensor 
to combine information from abstracts, titles, key- 
words, authors, and citations. Chew et al. (2007) ap- 
plied a tensor decomposition to a term x document 
X language tensor, for cross-language information 
retrieval. Sun et al. (2006) analyzed an author x 
keyword x date tensor. 

In collaborative filtering (Billsus and Pazzani, 
1998), SVD is usually applied to a user x item ma- 
trix, in which each row represents a person and each 
column represent an item, such as a movie or a book. 
An element in the matrix is a rating by the given user 
for the given item. Most of the elements in the ma- 



trix are missing, because each user has only rated a 
few items. When a zero element represents a miss- 
ing rating, SVD can be used to guess the missing 
ratings, based on the nonzero elements. 

The user-item matrix could be extended to a third- 
order tensor by adding a variety of information, such 
as the words used to describe an item, the words 
used to describe the interests of a user, the price of 
an item, the geographical location of a user, and the 
age of a user. For example, Mahoney et al. (2006) 
and Xu et al. (2006) applied tensor decompositions 
to collaborative filtering. 

In computational linguistics, SVD is often applied 
in semantic space models of word meaning. For ex- 
ample, Landauer and Dumais (1997) applied SVD to 
a word x document matrix, achieving human-level 
scores on multiple-choice synonym questions from 
the TOEFL test. Turney (2006) applied SVD to a 
word-pair x pattern matrix, reaching human-level 
scores on multiple-choice analogy questions from 
the SAT test. 

In our recent work, we have begun exploring ten- 
sor decompositions for semantic space models. We 
are currently developing a word x pattern x word 
tensor that can used for both synonyms and analo- 
gies. The experiments in Section 5.4 evaluate the 
four tensor decomposition algorithms using this ten- 
sor to answer multiple-choice TOEFL questions. 

In computational vision, SVD is often applied to 
image analysis (Brand, 2002). To work with the 
two-mode constraint of SVD, an image, which is 
naturally two-dimensional, is mapped to a vector. 
For example, in face recognition, SVD is applied to 
a. face x image-vector matrix, in which each row is 
a vector that encodes an image of a person's face. 
Wang and Ahuja (2005) pointed out that this two- 
mode approach to image analysis is ignoring essen- 
tial higher-mode structure in the data. The experi- 
ments in Wang and Ahuja (2005) demonstrate that 
higher-order tensor decompositions can be much 
more effective than SVD. 

In summary, wherever SVD has been useful, we 
expect there are higher-order modes that have been 
ignored. With algorithms that can decompose large 
tensors, it is no longer necessary to ignore these 
modes. 

3 Notation 

This paper follows the notational conventions of 
Kolda (2006). Tensors of order three or higher 
are represented by bold script letters, X. Matrices 
(second-order tensors) are denoted by bold capital 



letters, A. Vectors (first-order tensors) are denoted 
by bold lowercase letters, b. Scalars (zero-order ten- 
sors) are represented by lowercase italic letters, i. 

The i-th element in a vector b is indicated by 6j. 
The i-th row in a matrix A is denoted by a^^, the j-th 
column is given by a-j, and the element in row i and 
column j is represented by a-ij. 

A third-order tensor X has rows, columns, and 
tubes. The element in row i, column j, and tube 
k is represented by Xijk- The row vector that con- 
tains Xijk is denoted by Xj:/^., the column vector is 
x.jfc, and the tube vector is Xjj:. In general, the vec- 
tors in a tensor (e.g., the rows, columns, and tubes 
in a third-order tensor) are called fibers. There are 
no special names (beyond rows, columns, and tubes) 
for fibers in tensors of order four and higher. 

A third-order tensor X contains matrices, called 
slices. The horizontal, lateral, and frontal slices of X 
are represented by Xj::, 'K-j-, and X::^, respectively. 
The concept of slices also applies to tensors of order 
four and higher. 

An index i ranges from 1 to /; that is, the upper 
bound on the range of an index is given by the upper- 
case form of the index letter. Thus the size of a ten- 
sor X is denoted by uppercase scalar^s, Ii x I2 x I^. 

There are several kinds of tensor products, but we 
only need the n-mode product in this paper. The 
n-mode product of a tensor X and a matrix A is 
written as X x „ A. Let X be of size Ii x I2 x L^ 
and let A be of size Ji x J2. The ?i-mode prod- 
uct X X „ A multiplies fibers in mode n of X with 
row vectors in A. Therefore n-mode multiplication 
requires that /„ = J2. The result of X x^^ A is a 
tensor with the same order (same number of modes) 
as X, but with the size /„ replaced by Ji. For ex- 
ample, the result of X x 3 A is of size Ii x I2 x Ji, 
assuming I3 = J2- 

Let X be an A^-th order tensor of size IiX. . .xl^ 
and let A be a matrix of size J x /„. Suppose that 
y = X x.„ A. Thus y is of size Ii x . . . x /„_i x 
J X In+i X ... X In . The elements of 'H are defined 
as follows: 



In 

yil...i„-lji„+l...iN ~ / J Xii...iN '^jin (^) 

i„=l 

The transpose of a matrix A is written as A^. We 
may think of the classical matrix product AB as a 
special case of n-mode product: 



AB = A X2B'^ = B xi A 



(2) 



Fibers in mode two of A (row vectors) ai^e multi- 
plied with row vectors in B^, which are column vec- 
tors (mode one) in B. 

A tensor X can be unfolded into a matrix, which 
is called matricization. The n-mode matricization of 
X is written X(„\, and is formed by taking the mode 
n fibers of X and making them column vectors in 
X(„). Let X be a tensor of size /i x /2 x -^3- The 
one-mode matricization X(i) is of size /i x {I2 Is): 



X(i) = [x;ii X;2i ... y^-.hh] (3) 

= [X,iX,2 ... y^-.j-A (4) 

Similarly, the two-mode matricization X(2) is of size 

h X (/i h): 

X(2) = [Xi:i X2:l ... X/,:/3] (5) 

= [X^.X^^ ... X^.J (6) 

Note that y = Xx„Aif andonly if Y(,„) = AX(„). 
Thus n-mode matricization relates the classical ma- 
trix product to the n-mode tensor product. In the 
special case of second-order tensors. Cm) = C and 

C(2) = C^, hence C = B Xi A if and only if 
C = AB. Likewise C = B X2 A if and only if 
C^ = AB^. 

Let S be a tensor of size Ji x ... x Jf<[. Let 
A(^) , . . . , A^^) be matrices such that A^") is of size 
In X Jn- The Tucker operator is defined as follows 
(Kolda, 2006): 

[S;A«,A(2),...,AW1 

= SxiA«X2A(2)...x;vAW (7) 

The resulting tensor is of size /i x . . . x /jy. 

Let X be a tensor of size /i x . . . x /yv- The Tucker 
decomposition of X has the following form (Tucker, 
1966; Kolda, 2006): 



X«[g;A«,...,A(^)l 



(8) 



The tensor S is called the core of the decomposition. 
Let S be of size Ji x . . . x Jj\f. Each matrix A^") is 

of size In X Jn. 

The n-rank of a tensor X is the rank of the ma- 
trix X(„). For a second-order tensor, the one-rank 
necessarily equals the two-rank, but this is not true 
for higher-order tensors. If J„ is equal to the n-rank 
of X for each n, then it is possible for the Tucker 



decomposition to exactly equal X. In general, we 
want Jn less than the n-rank of X for each n, yield- 
ing a core S that has lower n-ranks than X, analo- 
gous to a truncated (thin) SVD. In the special case 
of a second-order tensor, the Tucker decomposition 
X fa [S ;U,V| is equivalent to the thin SVD, 
X « USV"^. 

Suppose we have a tensor X and its Tucker de- 
composition X = [S ; A(i), . . . , A(^)|, such that 
X ss X. In the experiments in Section 5, we mea- 
sure the^z? of the decomposition X to the original X 
as follows: 



fit(X,X) = 1 



J\j iA- 



IXI 



(9) 



The Frobenius norm of a tensor X, IIXIL, is the 
square root of the sum of the absolute squares of 
its elements. The fit is a normalized measure of the 
eiTor in reconstructing X from its Tucker decompo- 
sition X. When X = X, the fit is 1; otherwise, it 
is less than 1, and it may be negative when the fit is 
particularly poor. 

The equivalence between the n-mode tensor prod- 
uct and the classical matrix product with n-mode 
matricization suggests that tensors might be merely 
a new notation; that there may be no advantage to 
using the Tucker decomposition with tensors instead 
of using SVD with unfolded (matricized) tensors. 
Perhaps the different layers (slices) of the tensor do 
not actually interact with each other in any interest- 
ing way. This criticism would be appropriate if the 
Tucker decomposition used only one mode, but the 
decomposition uses all N modes of X. Because all 
modes are used, the layers of the tensor are thor- 
oughly mixed together. 

For example, suppose X w [S ;A, B,C]. Let 
Xj:: be a slice of X. There is no slice of S, say 
Gj--, such that we can reconstruct Xj;; from Gj-,-,, 
using A, B, and C. We need all of S in order to 
reconstruct X^;;. 

All four of the algorithms that we examine in 
this paper perform the Tucker decomposition. One 
reason for our focus on the Tucker decomposition 
is that Bro and Andersson (1998) showed that the 
Tucker decomposition can be combined with other 
tensor decompositions, such as PARAFAC (Harsh- 
man, 1970; Carroll and Chang, 1970). In general, 
algorithms for the Tucker decomposition scale to 
large tensors better than most other tensor decom- 
position algorithms; therefore it is possible to im- 
prove the speed of other algorithms by first com- 



pressing the tensor with the Tucker decomposition. 
The slower algorithm (such as PARAFAC) is then 
applied to the (relatively small) Tucker core, instead 
of the whole (large) input tensor (Bro and Ander- 
sson, 1998). Thus an algorithm that can perform 
the Tucker decomposition with large tensors makes 
it possible for other kinds of tensor decompositions 
to be applied to large tensors. 

4 Algorithms 

This section introduces the four tensor decompo- 
sition algorithms. All four algorithms take as in- 
put an arbitrary tensor X and a desired core size 
Ji X . . . X J^v and generate as output a Tucker de- 
composition X = [S ; A(^\ . . . , A^^^J, in which 
the matrices A^") are orthonormal. 

For HO-SVD (Higher-Order Singular Value De- 
composition) and HOOI (Higher-Order Orthogonal 
Iteration), we show the algorithms specialized for 
third-order tensors and generalized for arbitrary ten- 
sors. For SP (Slice Projection) and MP (Multislice 
Projection), we present the algorithms for third- 
order and fourth-order tensors and leave the gen- 
eralization for arbitrary tensors as an excercise for 
the reader. (There is a need for a better notation, to 
write the generalization of SP and MP to arbitrary 
tensors.) 

4.1 Higher-Order SVD 

Figure 1 presents the HO-SVD algorithm for third- 
order tensors. Figure 2 gives the generalization of 
HO-SVD for tensors of arbitrary order (De Lath- 
auwer et al., 2000a; Kolda, 2006). In the follow- 
ing experiments, we used the implementation of 
HO-SVD in the MATLAB Tensor Toolbox (Bader 
and Kolda, 2007b). HO-SVD is not a distinct func- 
tion in the Toolbox, but it is easily extracted from the 
Tucker Alternating Least Squares function, where it 
is a component. 

HO-SVD does not attempt to optimize the fit, 
fit(X, X) (Kolda, 2006). That is, HO-SVD does 
not produce an optimal rank-Ji, . . . , Jat approxi- 
mation to X, because it optimizes for each mode 
separately, without considering interactions among 
the modes. However, we will see in Section 5 that 
HO-SVD often produces a reasonable approxima- 
tion, and it is relatively fast. For more information 
about HO-SVD, see De Lathauwer et al. (2000a). 

4.2 Higher-Order Orthogonal Iteration 

Figure 3 presents the HOOI algorithm for third- 
order tensors. Figure 4 gives the generalization of 



HOOI for tensors of arbitrary order (De Lathauwer 
et al., 2000b; Kolda, 2006). HOOI is implemented 
in the MATLAB Tensor Toolbox (Bader and Kolda, 
2007b), in the Tucker Alternating Least Squares 
function. 

HOOI uses HO-SVD to initialize the matrices, be- 
fore entering the main loop. The implementation in 
the MATLAB Tensor Toolbox gives the option of 
using a random initialization, but initialization with 
HO-SVD usually results in a better fit. 

In the main loop, each matrix is optimized in- 
dividually, while the other matrices are held fixed. 
This general method is called Alternating Least 
Squares (ALS). HOOI, SP and MP all use ALS. 

The main loop terminates when the change in fit 
drops below a threshold or when the number of itera- 
tions reaches a maximum, whichever comes first. To 
calculate the fit for each iteration, HOOI first calcu- 
lates the core S using [X ; A^^^"^, . . . , A^^)"^], and 
then calculates X from [[S ; A^ ^ . . . , A^^^. The 
change in fit is the fit of the Tucker decomposition 
after the t-th iteration of the main loop minus the fit 
from the previous iteration: 



Afit(t) = fit(x,x^*^: 



fit(X,X^* ^^) (10) 



In the experiments, we set the threshold for Agt (t) at 
10^'^ and we set the maximum number of iterations 
at 50. (These are the default values in the MATLAB 
Tensor Toolbox.) The main loop usually terminated 
after half a dozen iterations or fewer, with Afit(t) 
less than 10^^. 

As implemented in the MATLAB Tensor Tool- 
box, calculating the HO-SVD initialization, the in- 
termediate tensor Z, and the change in fit, Afit(i), 
requires bringing the entire input tensor X into 
RAM. Although sparse representations are used, this 
requirement limits the size of the tensors that we can 
process, as we see in Section 5.1. For more informa- 
tion about HOOI, see De Lathauwer et al. (2000b) 
and Kolda (2006). 

4.3 Slice Projection 

Figure 5 presents the SP algorithm for third-order 
tensors (Wang and Ahuja, 2005). Although Wang 
and Ahuja (2005) do not discuss tensors beyond the 
third-order, the SP algorithm generalizes to tensors 
of arbitrary order. For example. Figure 6 shows SP 
for fourth-order tensors. 

Instead of using HO-SVD, Wang and Ahuja 
(2005) initialize SP randomly, to avoid bringing X 



in: Tensor X of size Ii x I2 x I3. 

in: Desired rank of core: Ji x J2 x J3. 

A <— Ji leading eigenvectors of X/^^XX n - Xn) is the unfolding of X on mode 1 

B ^ J2 leading eigenvectors of X(2)XJ2^ 
C ^ J3 leading eigenvectors of X(3)XXn 

S^[X;AT,bT,cT] 

out: S of size Ji x J2 x J3 and orthonormal matrices A of 
size /i X Ji, B of size I2 x J2, and C of size I^ x J3, 
such that X« |g;A,B,C]]/ 

Figure 1: Higher-Order Singular Value Decomposition for third-order tensors (De Lathauwer et al., 2000a). 



in: Tensor X of size Ii x I2 x ■ ■ ■ x Ij^. 

in: Desired rank of core: Ji x J2 x • • • x J^r. 

for n = 1, . . . , A^ do 

A^") ^- Jn leading eigenvectors of X(„)XT x - X^^) is the unfolding of X on mode n 

end for 

S^[X;AWT,...,aWT 



out: S of size Ji x J2 x • • • x Jjy and orthonormal matrices 
A(") of size /„ X Jn such that X w |g ; A^^), . . . , A(^)]]. 



Figure 2: Higher-Order Singular Value Decomposition for tensors of arbitrary order (De Lathauwer et al. 
2000a). 



in: Tensor X of size Ii x I2 x I3. 

in: Desired rank of core: Ji x J2 x J3. 

B ^ J2 leading eigenvectors of X(2)XT2n 
C ^ J3 leading eigenvectors of X(3)Xl^^ 

while not converged do 

A ^ Ji leading eigenvectors of U(l)UX^ 
V^[X;AT,I2,CT1 
B ^ J2 leading eigenvectors of V(2)VL) 
W^[X;AT,BT,l3j 
C ^ J3 leading eigenvectors of W(3)WL> 
end while 



initialization via HO-SVD 



- main loop 



S^[X;AT,bT,cT] 

out: S of size Ji x J2 x J3 and orthonormal matrices A of 
size /i X Ji, B of size I2 x J2, and C of size L^ x J3, 
such that X« |g;A,B,C]]. 



Figure 3: Higher-Order Orthogonal Iteration for third-order tensors (De Lathauwer et al., 2000b; Kolda, 
2006). Note that it is not necessary to initialize A, since the while loop sets A using B and C. Ij is the 
identity matrix of size /j x /j. 



in: Tensor X of size Ii x I2 x ■ ■ ■ x Ij^. 

in: Desired rank of core: Ji x J2 x • • • x J^r. 

forn = 2, . . . , Af do 

A(") ^- Jn leading eigenvectors of X(„)XT x 
end for 



- initiahzation via HO-SVD 



while not converged do 
for n = 1, . . . , A^ do 

2, ^ [X ; A(i)T, . . . , A("-i)T, I„, A(-+i)T, . . . , A^^] 
A(") <— J„ leading eigenvectors of Z(-„)Z7 ^ 
end for 
end while 

S-[X;AWT,...,AWT] 

out: S of size Ji x J2 x • • • x Jjv and orthonormal matrices 
A(") of size /„ X Jn such that X « |g ; A^^), . . . , A^^)]. 



- main loop 



Figure 4: Higher-Order Orthogonal Iteration for tensors of arbitrary order (De Lathauwer et al., 2000b; 
Kolda, 2006). I„ is the identity matrix of size I„ x J„. 



in: Tensor X of size Ii x I2 x I^. 

in: Desired rank of core: Ji x J2 x J3. 



C <— random matrix of size I3 x J3 



normalize columns to unit length 



while not converged do 

Mi3^ J^X:i;CC"^x;^^ 

i=l 

A ■<— Ji leading eigenvectors of M13MI3 
h 

B <— J2 leading eigenvectors of M2iMJj^ 

M32^ J]xTBB"^Xi:: 
i=\ 

C ^ J3 leading eigenvectors of M32MJ2 
end while 



main loop 
slices on mode 2 



slices on mode 3 



slices on mode 1 



S^[X;AT,bT,cT] 

out: S of size Ji x J2 x J3 and orthonormal matrices A of 
size I\ X Ji, B of size I2 x J 2, and C of size I^ x J^, 
such that X« |g;A,B,C]]/ 



Figure 5: Slice Projection for third-order tensors (Wang and Ahuja, 2005). Note that it is not necessary to 
initialize A and B. 



in: Tensor X of size Ii x I2 x I^ x I4. 

in: Desired rank of core: Ji x J2 x J3 x J4. 

D <— random matrix of size I4 x J4 

while not converged do 

h h 

i=l i=l 

A ^ Ji leading eigenvectors of M14MI4 

B <— J2 leading eigenvectors of M21M21 
h h 

C ^ J3 leading eigenvectors of M32M32 
h h 

i=l j=l 

D ^ J4 leading eigenvectors of M43M43 
end while 

S^IX;AT,bT,cT,dTJ 

out: S of size Ji x J2 x J3 x J4 and orthonormal matrices 
A of size /i X Ji, B of size I2 x J2, C of size I^ x J3, 
and D of size I4 x J4, such that X « [[S ; A, B, C, D]. 



- normalize columns to unit length 

- main loop 

- slices on modes 2 and 3 



- slices on modes 3 and 4 



slices on modes 1 and 4 



- slices on modes 1 and 2 



Figure 6: Slice Projection for fourth-order tensors. 



into RAM. In Figure 5, the matrix C is is filled with 
random numbers from the uniform distribution over 
[0, 1] and then the columns are normalized. 

Note that HO-SVD calculates each matrix from X 
alone, whereas HOOI calculates each matrix from 
X and all of the other matrices. SP lies between 
HO-SVD and HOOI, in that it calculates each matrix 
from X and one other matrix. 

In the main loop, the input tensor X is processed 
one slice at a time, again to avoid bringing the whole 
tensor into RAM. Before entering the main loop, the 
first step is to calculate the slices and store each slice 
in a file. MP requires this same first step. The MAT- 
LAB source code for MP, given in the Appendix, 
shows how we calculate the slices of X without 
bringing all of X into RAM. 

Our approach to constructing the slice files as- 
sumes that the input tensor is given in a sparse rep- 
resentation, in which each nonzero element of the 
tensor is described by one line in a file. The descrip- 
tion consists of the indices that specify the location 
of the nonzero element, followed by the value of the 
nonzero element. For example, the element Xijk of 
a third-order tensor X is described as {i,j, k, Xijk)- 
To calculate the n-mode slices, we first sort the in- 
put tensor file by mode n. For example, we generate 
two-mode slices by sorting on j, the second column 
of the input file. This puts all of the elements of an 
n-mode slice together consecutively in the file. Af- 
ter sorting on mode n, we can read the sorted file 
one slice at a time, writing each mode n slice to its 
own unique file. 

To sort the input file, we use the Unix sort com- 
mand. This command allows the user to specify the 
amount of RAM used by the sort buffer. In the fol- 
lowing experiments, we arbitrarily set the buffer to 
4 GiB, half the available RAM. (For Windows, the 
Unix sort command is included in Cygwin.) 

The main loop terminates after a maximum num- 
ber of iterations or when the core stops growing, 
whichever comes first. The growth of the core is 
measured as follows: 



Ag(t) = 1 



5{t-l) 



;(i) 



(11) 



In this equation, S is the core after the t-th iter- 
ation. We set the threshold for Ag(t) at 10"^ and 
we set the maximum number of iterations at 50. The 
main loop usually terminated after half a dozen iter- 
ations or fewer, with Ag(i) less than 10~"^. 



SP uses Ag(t) as a proxy for Afit(t), to avoid 
bringing X into RAM. With each iteration, as the 
estimates for the matrices improve, the core captures 
more of the variation in X, resulting in growth of the 
core. It is not necessary to bring X into RAM in or- 
der to calculate S*-*'; we can calculate S*-*^ one slice 
at a time, as given in the Appendix. 

For more information about SP, see Wang and 
Ahuja (2005). Wang et al. (2005) introduced another 
low RAM algorithm for higher-order tensors, based 
on blocks instead of slices. 

4.4 Multislice Projection 

Figure 7 presents the MP algorithm for third-order 
tensors. The MP algorithm generalizes to arbitrary 
order. Figure 8 shows MP for fourth-order tensors. 

The basic structure of MP is taken from SP, but 
MP takes three ideas from HOOI: (1) use HO-SVD 
to initialize, instead of random initialization, (2) use 
fit to determine convergence, instead of using the 
growth of the core, (3) use all of the other matri- 
ces to calculate a given matrix, instead of using only 
one other matrix. Like SP, MP begins by calculating 
all of the slices of the input tensor and storing each 
slice in a file. See the Appendix for details. 

We call the initialization pseudo HO-SVD initial- 
ization, because it is not exactly HO-SVD, as can be 
seen by comparing the initialization in Figure 3 with 
the initialization in Figure 7. Note that X(2) in Fig- 
ure 3 is of size I2 x (Ii I^), whereas M2 in Figure 7 
is of size I2 x I2, which is usually much smaller. 
HO-SVD brings the whole tensor into RAM, but 
pseudo HO-SVD processes one slice at a time. 

The main loop terminates when the change in fit 
drops below a threshold or when the number of it- 
erations reaches a maximum, whichever comes first. 
We calculate the fit one slice at a time, as given in 
the Appendix; it is not necessary to bring the whole 
input tensor into RAM in order to calculate the fit. 
We set the threshold for Afit(t) at 10~^ and we set 
the maximum number of iterations at 50. The main 
loop usually terminated after half a dozen iterations 
or fewer, with Afit(t) less than 10"'^. 

The most significant difference between SP and 
MP is that MP uses all of the other matrices to cal- 
culate a given matrix. For example, M13 in Figure 5 
is based on X and C, whereas the con^esponding Mi 
in Figure 7 is based on X, B, and C. In this respect, 
MP is like HOOI, as we can see with the correspond- 
ing U in Figure 3. By slicing on two modes, instead 
of only one, we improve the fit of the tensor, as we 
shall see in the next section. 
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in: Tensor X of size Ii x I2 x I^. 

in: Desired rank of core: Ji x J2 x J3. 



h h 

i=l i=l 

B <— J2 leading eigenvectors of M2M2 
h h 

C ^ J3 leading eigenvectors of M3MJ 

wiiile not converged do 

/3 h 

Ml ^ ^ X.iBBTxT + Y^ X.i-CC'^X.l 

i=l i=l 

A ^ Ji leading eigenvectors of MiM]" 

/3 h 

M2 ^ ^X;r,AATX::, + ^X,::CCTX^^ 

B ^ J2 leading eigenvectors of M2MJ 

h h 

M3 ^ ^XT AA^X^i: + ^X^BB^Xi, 

C ^ J3 leading eigenvectors of M3MJ 
end while 



pseudo HO-SVD initialization 



main loop 

slices on modes 2 and 3 



- slices on modes 1 and 3 



- slices on modes 1 and 2 



T T3T r^Tn 



S^[X;AT,bT,C 

out: S of size Ji x J2 x J3 and orthonormal matrices A of 
size /i X Ji, B of size I2 x J2, and C of size I3 x J3, 
such that Xss [[g;A,B,Cl. 



Figure 7: Multislice Projection for third-order tensors. MATLAB source code for this algorithm is provided 
in the Appendix. 
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in: Tensor X of size Ii x I2 x I^ x I4. 

in: Desired rank of core: Ji x J2 x J3 x J4. 

h h h h h I3 

i=l j=l 4=1 j=l i=l j=l 

B <— J2 leading eigenvectors of M2MJ 

h h h h h h 

M3 - EE^-.^-^+EE^-.^-^+EE^*^^^^^:: 

j=l j = l i=l j = l i=l j = l 

C ^ J3 leading eigenvectors of M3M3 

m4-eex;^^x,,.+x;x:x^^x,,.+x:exi^x,,. 

i=l j=l i=l j=l i=l j=l 

D <— J4 leading eigenvectors of M4M4 
while not converged do 

Ml ^ E E X:.,bbTx;^^ + f^ f^ x.,.^ccTx;^^^ + flfl x^^^dd^x;^ 

i=l j = l i=l j = l 1=1 j = l 

A <— Ji leading eigenvectors of MiM^ 

h h h h h h 

M2 - E E x;^, AA"^x,,, + Y.Y. x.:,ccTx^^^ +T.Y. x.,:Ddtx^ 

«=1 j=l i=l j=l j=l j=l 

B <— J2 leading eigenvectors of M2MJ 

h h h h h h 

M3 - E E X;^;, AA^X.,:, + E E X^^^BB^X,,, + Y^Y1 X.,::DDTxI 

j=l j=l i=l J=l J=l J=l 

C ^ J3 leading eigenvectors of M3MJ 

M4 - E E x;^ AA^x.,,. + E E x^^BB^x,:,. + E E x^^cc-^x,,, 

j=l jf=l i=l j=l i=l j=l 

D ^ J4 leading eigenvectors of M4M4 
end while 

S^IX;AT,bT,cT,dTJ 

out: S of size Ji x J2 x J3 x J4 and orthonormal matrices 
A of size Ji X Ji, B of size I2 x J2, C of size I3 x J3, 
and D of size I4 x J4, such that X R^ [[S ; A, B, C, D]. 

Figure 8: Multislice Projection for fourth-order tensors. 



12 



5 Experiments 

This section presents the four groups of experiments. 
The hardware for these experiments was a computer 
with two dual-core AMD Opteron 64 processors, 8 
GiB of RAM, and a 16 GiB swap file. The software 
was 64 bit Suse Linux 10.0, MATLAB R2007a, and 
MATLAB Tensor Toolbox Version 2.2 (Bader and 
Kolda, 2007b). The algorithms only used one of 
the four cores; we did not attempt to perform par- 
allel processing, although SP and MP could be pai^- 
allelized readily. 

The input files are plain text files with one line for 
each nonzero value in the tensor. Each line consists 
of integers that give the location of the nonzero value 
in the tensor, followed by a single real number that 
gives the nonzero value itself. The input files are 
in text format, rather than binary format, in order to 
facilitate sorting the files. 

The output files are binary MATLAB files, con- 
taining the tensor decompositions of the input files. 
The four algorithms generate tensor decompositions 
that are numerically different but structurally identi- 
cal. That is, the numerical values are different, but, 
for a given input tensor, the four algorithms generate 
core tensors and matrices of the same size. There- 
fore the output file size does not depend on which 
algorithm was used. 

5.1 Varying Tensor Sizes 

The goal of this group of experiments was to eval- 
uate the four algorithms on increasingly larger ten- 
sors, to discover how their performance scales with 
size. HO-SVD and HOOI assume that the input ten- 
sor fits in RAM, whereas SP and MP assume that 
the input tensor file must be read in blocks. We ex- 
pected that HO-SVD and HOOI would eventually 
run out of RAM, but we could not predict precisely 
how the four algorithms would scale, in terms of fit, 
time, and space. 

Table 1 summarizes the input test tensors for the 
first group of experiments. The test tensors are ran- 
dom sparse third-order tensors, varying in size from 
250'^ to 2000'^. The number of nonzeros in the ten- 
sors varies from 1.6 million to 800 million. The 
nonzero values are random samples from a uniform 
distribution between zero and one. 

Table 2 shows the results of the first group of ex- 
periments. HO-SVD and HOOI were only able to 
process the first four tensors, with sizes from 250'^ to 
1000^. The 1000^ tensor required almost 16 GiB of 
RAM. The next tensor, 1250^, required more RAM 
than was available (24 GiB; 8 GiB of actual RAM 



plus a 16 GiB swap file). On the other hand, SP 
and MP were able to process all eight tensors, up 
to 2000"^. Larger tensors are possible with SP and 
MP; the limiting factor becomes run time, rather 
than available RAM. 

Figure 9 shows the fit of the four algorithms. 
HOOI has the best fit, followed by MR then SP, 
and finally HO-SVD. The curves for HO-SVD and 
HOOI stop at 100 million nonzeros (the 1000^ ten- 
sor), but it seems likely that the same trend would 
continue, if sufficient RAM were available to apply 
HO-SVD and HOOI to the larger tensors. 

The fit is somewhat low, at about 4%, due to the 
difficulty of fitting a random tensor with a core size 
that is 0. 1 % of the size of the input tensor. However, 
we are interested in the relative ranking of the four 
algorithms, rather than the absolute fit. The results 
in Section 5.4 show that the ranking we see here, in 
Figure 9, is predictive of the relative performance on 
a real (nonrandom) task. 

Figure 10 shows the RAM use of the algorithms. 
As we can see in Table 2, there are two components 
to the RAM use of SP and MP, the RAM used by 
sort and the RAM used by MATLAB. We arbitrarily 
set the sorting buffer to 4 GiB, which sets an upper 
bound on the RAM used by sort. A machine with 
less RAM could use a smaller sorting buffer. We 
have not experimented with the buffer size, but we 
expect that the buffer could be made much smaller, 
with only a slight increase in run time. The growth 
of the MATLAB component of RAM use of SP and 
MP is slow, especially in comparison to HO-SVD 
and HOOI. 

Figure 1 1 gives the run time. For the smallest ten- 
sors, SP and MP take longer to run than HO-SVD 
and HOOI, because SP and MP make more use of 
files and less use of RAM. With a tensor size of 
lOOO^, both HO-SVD and HOOI use up the avail- 
able hai^dware RAM (8 GiB) and need to use the vir- 
tual RAM (the 16 GiB swap file), which explains 
the sudden upward surge in Figure 1 1 at 100 million 
nonzeros. In general, the run time of SP and MP is 
competitive with HO-SVD and HOOI. 

The results show that SP and MP can handle much 
larger tensors than HO-SVD and HOOI (800 million 
nonzeros versus 100 million nonzeros), with only a 
small penalty in run time for smaller tensors. How- 
ever, HOOI yields a better fit than MP. If fit is impor- 
tant, we recommend HOOI for smaller tensors and 
MP for larger tensors. If speed is more important, 
we recommend HO-SVD for smaller tensors and SP 
for larger tensors. 
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Input tensor size 


Core size 


Density 


Nonzeros 


Input file 


Output file 


(II X /2 X /3) 


(Jl X J2 X J3) 


(% Nonzero) 


(Millions) 


(GiB) 


(MiB) 


250 X 250 X 250 


25 X 25 X 25 


10 


1.6 


0.03 


0.3 


500 X 500 X 500 


50 X 50 X 50 


10 


12.5 


0.24 


1.5 


750 X 750 X 750 


75 X 75 X 75 


10 


42.2 


0.81 


4.3 


1000 X 1000 X 1000 


100 X 100 X 100 


10 


100.0 


1.93 


9.5 


1250 X 1250 X 1250 


125 X 125 X 125 


10 


195.3 


3.88 


17.7 


1500 X 1500 X 1500 


150 X 150 X 150 


10 


337.5 


6.85 


29.7 


1750 X 1750 X 1750 


175 X 175 X 175 


10 


535.9 


11.03 


46.0 


2000 X 2000 X 2000 


200 X 200 X 200 


10 


800.0 


16.64 


67.4 



Table 1 : Random sparse third-order tensors of varying size. 



Algorithm 


Tensor 


Nonzeros 


Fit 


Run time 


Matlab RAM 


Sort RAM 


Total RAM 






(Millions) 


(%) 


(HH:MM:SS) 


(GiB) 


(GiB) 


(GiB) 


HO-SVD 


250^ 


1.6 


3.890 


00:00:24 


0.21 


0.00 


0.21 


HO-SVD 


500^ 


12.5 


3.883 


00:03:44 


1.96 


0.00 


1.96 


HO-SVD 


750^ 


42.2 


3.880 


00:14:42 


6.61 


0.00 


6.61 


HO-SVD 


1000^ 


100.0 


3.880 


01:10:13 


15.66 


0.00 


15.66 


HOOI 


250^ 


1.6 


4.053 


00:01:06 


0.26 


0.00 


0.26 


HOOI 


500^ 


12.5 


3.982 


00:09:52 


1.98 


0.00 


1.98 


HOOI 


750^ 


42.2 


3.955 


00:42:45 


6.65 


0.00 


6.65 


HOOI 


1000^ 


100.0 


3.942 


04:01:36 


15.74 


0.00 


15.74 


SP 


250^ 


1.6 


3.934 


00:01:21 


0.01 


1.41 


1.42 


SP 


500^ 


12.5 


3.906 


00:10:21 


0.02 


4.00 


4.03 


SP 


750^ 


42.2 


3.896 


00:34:39 


0.06 


4.00 


4.06 


SP 


1000^ 


100.0 


3.893 


01:43:20 


0.11 


4.00 


4.12 


SP 


1250^ 


195.3 


3.890 


03:16:32 


0.21 


4.00 


4.22 


SP 


1500^ 


337.5 


3.888 


06:01:47 


0.33 


4.00 


4.33 


SP 


1750^ 


535.9 


3.886 


09:58:36 


0.54 


4.00 


4.54 


SP 


2000^ 


800.0 


3.885 


15:35:21 


0.78 


4.00 


4.79 


MP 


250^ 


1.6 


3.979 


00:01:45 


0.01 


1.41 


1.42 


MP 


500^ 


12.5 


3.930 


00:13:55 


0.03 


4.00 


4.03 


MP 


750^ 


42.2 


3.914 


00:51:33 


0.06 


4.00 


4.07 


MP 


1000^ 


100.0 


3.907 


02:21:30 


0.12 


4.00 


4.12 


MP 


1250^ 


195.3 


3.902 


05:05:11 


0.22 


4.00 


4.23 


MP 


1500^ 


337.5 


3.899 


09:28:49 


0.37 


4.00 


4.37 


MP 


1750^ 


535.9 


3.896 


16:14:01 


0.56 


4.00 


4.56 


MP 


2000^ 


800.0 


3.894 


25:43:17 


0.81 


4.00 


4.82 



Table 2: Performance of the four algorithms with tensors of varying size. 
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4.06% 
4.04% 
4.02% 
4.00% 

c^ 3.98% 

o 

O 3.96% 
Q. 

.ti 3.94% 
LL 

3.92% 
3.90% 
3.88% 
3.86% 




■ HOOl 
T MP 
▲ SP 
M HO-SVD 
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Figure 9: The fit of tlie four algoritlims as a function of the number of nonzeros. 
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Figure 10: The RAM use of the four algorithms as a function of the number of nonzeros. Note that the size 
of the sorting buffer for SP and MP was arbitrarily set to 4 GiB. 
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Figure 1 1 : The run time of the four algorithms as a function of the number of nonzeros. 



5.2 Varying Core Size Ratios 

SP is somewhat different from the other three algo- 
rithms, in that it has a kind of asymmetry. Compare 
Mi3 in Figure 5 with Mi in Figure 7. We could 
have used B instead of C, to calculate A in Fig- 
ure 5, but we arbitrarily chose C. We hypothesized 
that this asymmetry would make SP sensitive to vari- 
ation in the ratios of the core sizes. 

In this group of experiments, we vary the ratios 
between the sizes of the core in each mode, as listed 
in Table 3. The effect of the ratio on the performance 
is shown in Table 4. Figure 12 illustrates the effect 
of the ratio on the fit. It is clear from the figure that 
SP is asymmetrical, whereas HO-SVD, HOOI, and 
MP are symmetrical. 

This asymmetry of SP might be viewed as a flaw, 
and thus a reason for preferring MP over SP, but it 
could also be seen as an advantage for SP. In the 
case where the ratio is 0.2, SP has a better fit than 
MP. This suggests that we might use SP instead of 
MP when the ratios between the sizes of the core in 
each mode are highly skewed; however, we must be 
careful to make sure that SP processes the matrices 
in the optimal order for the given core sizes. 

Note that the relative ranking of the fit of the 
four algorithms is the same as in the previous group 



of experiments (best fit to worst: HOOI, MP, SP, 
HO-SVD), except in the case of extreme skew. Thus 
Figure 12 shows the robustness of the relative rank- 
ing. 

5.3 Fourth-Order Tensors 

This group of experiments demonstrates that the pre- 
vious observations regarding the relative ranking of 
the fit also apply to fourth-order tensors. The exper- 
iments also investigate the effect of varying the size 
of the core, with a fixed input tensor size. 

Table 5 lists the core sizes that we investigated. 
The effect of the core sizes on the performance is 
shown in Table 6. Figure 13 shows the impact of 
core size on fit. 

The fit varies from about 4% with a core of 10^ 
to about 44% with a core of 90^. To make the dif- 
ferences among the algorithms clearer, we normal- 
ized the fit by using HO-SVD as a baseline. The 
fit relative to HO-SVD is defined as the percentage 
improvement in the fit of the given algorithm, com- 
pared to the fit of HO-SVD. 

Figure 13 shows that the differences among the 
four algorithms are lai^gest when the core is about 
SO'^; that is, the size of one mode of the core (50) is 
about half of the size of one mode of the input tensor 
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Input tensor size 
(Ii X /2 X Is) 



Core size 
(Ji X J2 X J3) 



Ratio 

{J1/J2 = J2/J3) 



Density 

(%) 



Nonzeros 
(Millions) 



Input file 
(GiB) 



Output file 
(MiB) 



500 X 500 X 500 


250 X 50 X 10 


500 X 500 X 500 


125 X 50 X 20 


500 X 500 X 500 


83 X 50 X 30 


500 X 500 X 500 


63 X 50 X 40 


500 X 500 X 500 


50 X 50 X 50 


500 X 500 X 500 


40 X 50 X 63 


500 X 500 X 500 


30 X 50 X 83 


500 X 500 X 500 


20 X 50 X 125 


500 X 500 X 500 


10 X 50 X 250 



5.00 
2.50 
1.66 
1.26 
1.00 
0.80 
0.60 
0.40 
0.20 



10 
10 
10 
10 
10 
10 
10 
10 
10 



12.5 


0.24 


12.5 


0.24 


12.5 


0.24 


12.5 


0.24 


12.5 


0.24 


12.5 


0.24 


12.5 


0.24 


12.5 


0.24 


12.5 


0.24 



2.05 
1.63 
1.51 
1.48 
1.46 
1.48 
1.51 
1.63 
2.05 



Table 3: Random sparse third-order tensors with varying ratios between the sizes of the core in each mode. 
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Figure 12: The fit of the four algorithms as a function of the ratios between the sizes of the core in each 
mode. 
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Algorithm 


Ratio 


Fit 


Run time 


Matlab RAM 


Sort RAM 


Total RAM 




(J1/J2 = J2/J3) 


(%) 


(HH:MM:SS) 


(GiB) 


(GiB) 


(GiB) 


HO-SVD 


5.00 


3.881 


00:06:54 


2.71 


0.00 


2.71 


HO-SVD 


2.50 


3.883 


00:04:53 


1.96 


0.00 


1.96 


HO-SVD 


1.66 


3.883 


00:04:15 


1.78 


0.00 


1.78 


HO-SVD 


1.26 


3.884 


00:03:53 


1.96 


0.00 


1.96 


HO-SVD 


1.00 


3.883 


00:03:48 


1.96 


0.00 


1.96 


HO-SVD 


0.80 


3.884 


00:03:33 


1.96 


0.00 


1.96 


HO-SVD 


0.60 


3.883 


00:03:24 


1.96 


0.00 


1.96 


HO-SVD 


0.40 


3.883 


00:03:15 


1.96 


0.00 


1.96 


HO-SVD 


0.20 


3.881 


00:03:06 


1.96 


0.00 


1.96 


HOOI 


5.00 


3.969 


00:27:24 


2.72 


0.00 


2.72 


HOOI 


2.50 


3.983 


00:16:23 


2.02 


0.00 


2.02 


HOOI 


1.66 


3.982 


00:12:53 


1.98 


0.00 


1.98 


HOOI 


1.26 


3.982 


00:11:06 


1.98 


0.00 


1.98 


HOOI 


1.00 


3.982 


00:09:53 


1.98 


0.00 


1.98 


HOOI 


0.80 


3.982 


00:09:02 


1.98 


0.00 


1.98 


HOOI 


0.60 


3.982 


00:08:11 


1.98 


0.00 


1.98 


HOOI 


0.40 


3.982 


00:07:26 


1.99 


0.00 


1.99 


HOOI 


0.20 


3.965 


00:05:32 


2.02 


0.00 


2.02 


SP 


5.00 


3.896 


00:11:18 


0.02 


4.00 


4.02 


SP 


2.50 


3.900 


00:09:36 


0.02 


4.00 


4.02 


SP 


1.66 


3.902 


00:09:30 


0.02 


4.00 


4.02 


SP 


1.26 


3.905 


00:10:12 


0.02 


4.00 


4.03 


SP 


1.00 


3.906 


00:10:13 


0.02 


4.00 


4.03 


SP 


0.80 


3.908 


00:10:12 


0.03 


4.00 


4.03 


SP 


0.60 


3.910 


00:10:23 


0.03 


4.00 


4.03 


SP 


0.40 


3.914 


00:10:32 


0.04 


4.00 


4.04 


SP 


0.20 


3.920 


00:12:43 


0.06 


4.00 


4.07 


MP 


5.00 


3.901 


00:15:01 


0.02 


4.00 


4.03 


MP 


2.50 


3.917 


00:14:05 


0.02 


4.00 


4.02 


MP 


1.66 


3.925 


00:13:46 


0.02 


4.00 


4.03 


MP 


1.26 


3.929 


00:13:47 


0.02 


4.00 


4.03 


MP 


1.00 


3.930 


00:13:51 


0.03 


4.00 


4.03 


MP 


0.80 


3.930 


00:13:45 


0.03 


4.00 


4.03 


MP 


0.60 


3.927 


00:14:17 


0.03 


4.00 


4.04 


MP 


0.40 


3.922 


00:14:37 


0.04 


4.00 


4.04 


MP 


0.20 


3.905 


00:16:33 


0.06 


4.00 


4.07 



Table 4: Performance of the four algorithms with varying ratios between the sizes of the core in each mode. 
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Input tensor size 


Core size 


Density 


Nonzeros 


Input file 


Output file 


ih X I2 X I3 


x/4) 


(Jl X J2 X J3 X J4) 


(%) 


(Millions) 


(MiB) 


(MiB) 


100^ 




90^ 


10 


10 


197.13 


480.68 


100^ 




80^ 


10 


10 


197.13 


300.16 


100^ 




70^ 


10 


10 


197.13 


176.02 


100^ 




60^ 


10 


10 


197.13 


95.08 


100^ 




50^ 


10 


10 


197.13 


45.91 


100^ 




40^ 


10 


10 


197.13 


18.86 


100^ 




30^ 


10 


10 


197.13 


6.02 


100^ 




20^ 


10 


10 


197.13 


1.23 


lOO'^ 




10^ 


10 


10 


197.13 


0.10 



CD 

o 

Q. 
Q 

> 

I 

o 

X 
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> 

'■4— • 

0) 



Table 5 : Random sparse fourth-order tensors with varying core sizes. 
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Figure 13: The fit of the four algorithms as a function of the core sizes, given fourth-order tensors. 
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Algorithm 


Core Size 


Fit 


Relative fit 


Run time 


Matlab RAM 


Sort RAM 


Total RAM 






(%) 


(%) 


(HH:MM:SS) 


(GiB) 


(GiB) 


(GiB) 


HO-SVD 


90^ 


44.007 


0.00 


00:05:35 


3.56 


0.00 


3.56 


HO-SVD 


80^ 


26.477 


0.00 


00:05:03 


3.10 


0.00 


3.10 


HO-SVD 


70^ 


16.449 


0.00 


00:04:20 


3.02 


0.00 


3.02 


HO-SVD 


60^ 


10.463 


0.00 


00:03:52 


2.62 


0.00 


2.62 


HO-SVD 


50^ 


6.988 


0.00 


00:03:27 


2.07 


0.00 


2.07 


HO-SVD 


40^ 


5.116 


0.00 


00:03:13 


1.89 


0.00 


1.89 


HO-SVD 


30^ 


4.232 


0.00 


00:02:57 


1.80 


0.00 


1.80 


HO-SVD 


20^ 


3.903 


0.00 


00:02:48 


1.75 


0.00 


1.75 


HO-SVD 


10^ 


3.827 


0.00 


00:02:34 


1.80 


0.00 


1.80 


HOOI 


90^ 


44.065 


0.13 


00:18:35 


4.32 


0.00 


4.32 


HOOI 


80^ 


26.600 


0.47 


00:21:16 


3.75 


0.00 


3.75 


HOOI 


70^ 


16.609 


0.97 


00:17:50 


3.28 


0.00 


3.28 


HOOI 


60^ 


10.629 


1.59 


00:15:00 


2.88 


0.00 


2.88 


HOOI 


50^ 


7.135 


2.10 


00:12:44 


2.53 


0.00 


2.53 


HOOI 


40^ 


5.227 


2.17 


00:10:19 


2.24 


0.00 


2.24 


HOOI 


30^ 


4.301 


1.63 


00:08:42 


1.97 


0.00 


1.97 


HOOI 


20^ 


3.933 


0.75 


00:05:25 


1.81 


0.00 


1.81 


HOOI 


10^ 


3.834 


0.18 


00:04:34 


1.80 


0.00 


1.80 


SP 


90^ 


44.029 


0.05 


01:45:07 


2.19 


4.00 


6.19 


SP 


80^ 


26.517 


0.15 


01:31:50 


1.55 


4.00 


5.56 


SP 


70^ 


16.499 


0.30 


01:17:53 


1.06 


4.00 


5.07 


SP 


60^ 


10.511 


0.46 


01:09:49 


0.44 


4.00 


4.44 


SP 


50^ 


7.026 


0.54 


01:04:21 


0.38 


4.00 


4.38 


SP 


40^ 


5.140 


0.47 


01:02:37 


0.13 


4.00 


4.13 


SP 


30^ 


4.247 


0.35 


01:01:09 


0.07 


4.00 


4.08 


SP 


20^ 


3.908 


0.11 


00:59:02 


0.04 


4.00 


4.04 


SP 


10^ 


3.828 


0.01 


00:57:56 


0.01 


4.00 


4.02 


MP 


90^ 


44.039 


0.07 


03:16:44 


2.19 


4.00 


6.19 


MP 


80^ 


26.544 


0.25 


02:31:07 


1.55 


4.00 


5.56 


MP 


70^ 


16.532 


0.50 


01:57:17 


1.06 


4.00 


5.07 


MP 


60^ 


10.547 


0.80 


01:36:45 


0.69 


4.00 


4.70 


MP 


50^ 


7.057 


0.98 


01:23:33 


0.38 


4.00 


4.38 


MP 


40^ 


5.163 


0.91 


01:14:23 


0.17 


4.00 


4.18 


MP 


30^ 


4.259 


0.63 


01:07:01 


0.07 


4.00 


4.08 


MP 


20^ 


3.911 


0.20 


01:04:29 


0.04 


4.00 


4.04 


MP 


10^ 


3.828 


0.03 


01:05:26 


0.01 


4.00 


4.02 



Table 6: Performance of the four algorithms with fourth-order tensors and varying core sizes. Relative fit is 
the percentage increase in fit relative to HO-SVD. 
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(100). When the core is very small or very large, 
compared to the input tensor, there is little difference 
in fit among the algorithms. 

The fit follows the same trend here as in the pre- 
vious two groups of experiments (best to worst: 
HOOI, MP, SP, HO-SVD), in spite of the switch 
from third-order tensors to fourth-order tensors. 
This further confirms the robustness of the results. 

Table 6 shows that SP and MP ai^e slow with 
fourth-order tensors, compared to HO-SVD and 
HOOI. This is a change from what we observered 
with third-order tensors, which did not yield such 
large differences in run time. This is because a 
fourth-order tensor has many more slices than a 
third-order tensor with the same number of ele- 
ments, and each slice is smaller. There is a much 
larger overhead associated with opening and clos- 
ing many small files, compared to a few large files. 
This could be ameliorated by storing several adja- 
cent slices together in one file, instead of using a 
separate file for each slice. 

Even with third-order tensors, grouping slices to- 
gether in one file would improve the speed of SP 
and MP. Ideally, the user would specify the maxi- 
mum RAM available and SP and MP would group 
as many slices together as would fit in the available 
RAM. 

5.4 Performance with Real Data 

So far; all our experiments have used random ten- 
sors. Our purpose with this last group of experi- 
ments is to show that the previous observations ap- 
ply to nonrandom tensors. In particular, the differ- 
ences in fit that we have seen so far are somewhat 
small. It seems possible that the differences might 
not matter in a real application of tensors. This 
group of experiments shows that the differences in 
fit result in differences in performance on a real task. 

The task we examine here is answering multiple- 
choice synonym questions from the TOEFL test. 
This task was first investigated in Landauer and Du- 
mais (1997). In ongoing work, we are exploring the 
application of third-order tensors to this task, com- 
bining ideas from Landauer and Dumais (1997) and 
Turney (2006). 

Table 7 describes the input data and the output 
tensor decomposition. The first mode of the tensor 
consists of all of the 391 unique words that occur 
in the TOEFL questions. The second mode is a set 
of 849 words from Basic English, which is an artifi- 
cial language that reduces English to a small, easily 
learned core vocabulary (Ogden, 1930). The third 



mode consists of 1020 patterns that join the words 
in the first two modes. These patterns were gener- 
ated using the approach of Turney (2006). The value 
of an element in the tensor is derived from the fre- 
quency of the corresponding word pair and pattern 
in a large corpus. 

A TOEFL question consists of a stem word (the 
target word) and four choice words. The task is to 
select the choice word that is most similar in mean- 
ing to the stem word. Our approach is to measure the 
similarity of two TOEFL words by the average sim- 
ilarity of their relations to the Basic English words. 

Let X be our input tensor. Suppose we wish to 
measure the similarity of two TOEFL words. Let 
Xj:: and Xj:: bc the slices of X that correspond to 
the two TOEFL words. Each slice gives the weights 
for all of the patterns that join the given TOEFL 
word to all of the Basic English words. Our mea- 
sure of similarity between the TOEFL words is cal- 
culated by comparing the two slices. 

Table 8 presents the performance of the four algo- 
rithms. We see that the fit follows the familiar pat- 
tern: HOOI has the best fit, then MP, next SP, and 
lastly HO-SVD. Note that MP and SP have similar 
fits. The final column of the table gives the TOEFL 
scores for the four algorithms. HOOI has the best 
TOEFL score, MP and SP have the same score, and 
HO-SVD has the lowest score. The bottom row of 
the table gives the TOEFL score for the raw input 
tensor, without the benefit of any smoothing from 
the Tucker decomposition. The results validate the 
previous experiments with random tensors and illus- 
trate the value of the Tucker decomposition on a real 
task. 

6 Conclusions 

The Tucker decomposition has been with us since 
1966, but it seems that it has only recently started 
to become popular. We believe that this is because 
only recently has computer hardware reached the 
point where large tensor decompositions are becom- 
ing feasible. 

SVD started to attract interest in the field of in- 
formation retrieval when it was applied to "prob- 
lems of reasonable size (1000-2000 document ab- 
stracts; and 5000-7000 index terms)" (Deerwester et 
al., 1990). In collaborative filtering, SVD attracted 
interest when it achieved good results on the Net- 
flix Prize, a dataset with a sparse matrix of 17,000 
movies rated by 500,000 users. In realistic applica- 
tions, size matters. The MATLAB Tensor Toolbox 
(Bader and Kolda, 2007a; Bader and Kolda, 2007b) 
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Input tensor size 


(/l X I2 X I3) 


391 X 849 X 1020 


Core size 


(Jl X J2 X J3) 


250 X 250 X 250 


Input file 


(MiB) 


345 


Output file 


(MiB) 


119 


Density 


(% Nonzero) 


5.27 


Nonzeros 


(Millions) 


18 



Table 7: Description of the input data and the output decomposition. 



Algorithm Fit Relative fit Runtime Matlab RAM Sort RAM Total RAM TOEFL 
(%) (%) (HH:MM:SS) (GiB) (GiB) (GiB) (%) 



HO-SVD 


21.716 


0.00 


00:10:28 


5.29 


0.00 


5.29 


80.00 


HOOI 


22.597 


4.05 


00:56:08 


5.77 


0.00 


5.77 


83.75 


SP 


22.321 


2.78 


00:30:02 


0.33 


4.00 


4.33 


81.25 


MP 


22.371 


3.01 


00:43:52 


0.33 


4.00 


4.34 


81.25 


Raw tensor 


- 


- 


- 


- 


- 


- 


67.50 



Table 8: Performance of the four algorithms with actual data. Relative fit is the percentage increase in fit 
relative to HO-SVD. 



has done much to make tensor decompositions more 
accessible and easier to experiment with, but, as we 
have seen here, RAM requirements become prob- 
lematic with tensors larger than 1000^. 

The aim of this paper has been to empirically eval- 
uate four tensor decompositions, to study their fit 
and their time and space requirements. Our primary 
concern was the ability of the algorithms to scale up 
to large tensors. The implementations of HO-SVD 
and HOOI, taken from the MATLAB Tensor Tool- 
box, assumed that the input tensor could fit in RAM, 
which limited them to tensors of size 1000'^. On the 
other hand, SP and MP were able to process tensors 
of size 2000^, with eight times more elements. 

The experiments in Section 5.4 suggest that the 
differences in fit among the four algorithms corre- 
spond to differences in performance on real tasks. 
It seems likely that good fit will be important for 
many applications; therefore, we recommend HOOI 
for those tensors that can fit in the available RAM, 
and MP for larger tensors. 
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Appendix: MATLAB Source for Multislice Projection 



function fit = multislice (data_dir, sparse_file, tuck.er_file, I, J) 
%MULTISLICE is a low RAM Tucker decomposition 

% Peter Turney 

% October 26, 2007 

% Copyright 2007, National Research Council of Canada 

% This program is free software: you can redistribute it and/or modify 

% it under the terms of the GNU General Public License as published by 

% the Free Software Foundation, either version 3 of the License, or 

% (at your option) any later version. 

% 

% This program is distributed in the hope that it will be useful, 

% but WITHOUT ANY WARRANTY; without even the implied warranty of 

% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

% GNU General Public License for more details. 

% You should have received a copy of the GNU General Public License 

% along with this program. If not, see <http://www.gnu.org/licenses/>. 

%% set parameters 

fprintf ('MULTISLICE is running . . . \n' ) ; 

maxloops =50; % maximum number of iterations 

eigopts.disp = 0; % suppress messages from eigsO 
minfitchange = le-4; % minimum change in fit of tensor 

%% make slices of input data file 

fprintf (' preparing slicesXn' ) ; 

model_dir = 'slicel'; 
mode2_dir = 'slice2'; 
mode3_dir = 'sliceS'; 

slice (data_dir, sparse_f ile,model_dir, 1,1) 
slice (data_dir, sparse_f ile, mode2_dir, 2,1) 
slice (data_dir, spar se_f ile, mode3_dir, 3,1) 

%% pseudo HO-SVD initialization 

% initialize B 

M2 = zeros (I (2) , I (2) ) ; 
for i = 1:1(3) 

X3_slice = load_slice (data_dir,mode3_dir, i) ; 

M2 = M2 + (X3_slice' * X3_slice) ; 
end 
for i = 1:1(1) 

Xl_slice = load_slice (data_dir , model_dir , i) ; 

M2 = M2 + (Xl_slice * Xl_slice' ) ; 
end 
[B,D] = eigs (M2*M2' , J(2) , ' Im' , eigopts) ; 

% initialize C 

M3 = zeros (I (3) ,1(3)) ; 
for i = 1:1(1) 

Xl_slice = load_slice (data_dir , model_dir , i) ; 

M3 = M3 + (Xl_slice' * Xl_slice); 
end 
for i = 1:1(2) 

X2_slice = load_slice (data_dir , mode2_dir , i) ; 

M3 = M3 + (X2_slice' * X2_slice) ; 
end 
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[C,D] = eigs (M3*M3' , J(3) , ' Im' ,eigopts) ; 
%% main loop 
old_fit = 0; 

fprintfC entering main loop of MULTISLICE\n' ) ; 
for loop_num = limaxloops 
% update A 

Ml = zeros (I (1) ,1(1)) ; 

for i = 1:1 (2) 

X2_slice = load_slice (data_dir , mode2_dir , i) ; 
M1 = Ml + ( (X2_slice * C) * (C * X2_slice' ) ) ; 

end 

for i = 1:1(3) 

X3_slice = load_slice (data_dir , mode3_dir , i) ,■ 
M1 = Ml + ( (X3_slice * B) * (B' * X3_slice' ) ) ; 

end 

[A,D] = eigs (M1*M1' , J(l) , ' Im' ,eigopts) ; 

% update B 

M2 = zeros (I (2) , I (2) ) ; 

for i = 1:1(3) 

X3_slice = load_slice (data_dir, mode3_dir, i) ; 
M2 = M2 + ((X3_slice' * A) * (A' * X3_slice)); 

end 

for i = 1:1(1) 

Xl_slice = load_slice (data_dir , model_dir , i) ,• 
M2 = M2 + ( (Xl_slice * C) * (C * Xl_slice' ) ) ; 

end 

[B,D] = eigs (M2*M2' , J(2) , ' Im' ,eigopts) ; 

% update C 

M3 = zeros (I (3) ,1(3)) ; 

for i = 1:1(1) 

Xl_slice = load_slice (data_dir , model_dir , i) ; 
M3 = M3 + ( (Xl_slice' * B) * (B' * Xl_slice)); 

end 

for i = 1:1(2) 

X2_slice = load_slice (data_dir , mode2_dir , i) ; 
M3 = M3 + ((X2_slice' * A) * (A' * X2_slice)); 

end 

[C,D] = eigs (M3*M3' , J(3) , ' Im' ,eigopts) ; 

% build the core 

G = zeros (I (1) *J(2) *J(3) , 1) ; 
G = reshape (G, [I (1) J(2) J(3)]); 
for i = 1:1(1) 

Xl_slice = load_slice (data_dir, model_dir , i) ; 

G(i,:,:) = B' * Xl_slice * C; 
end 

G = reshape (G, [I (1) ( J (2 ) * J (3) ) ] ) ; 
G = A' * G; 
G = reshape (G, [J(l) J(2) J(3)]); 

% measure fit 

normX = 0; 
sqerr = ; 
for i = 1:1(1) 

Xl_slice = load_slice (data_dir , model_dir , i) ; 

Xl_approx = reshape (G, [J(l) ( J (2 ) * J (3) ) ] ) ; 

Xl_approx = A(i, :) * Xl_approx; 

Xl_approx = reshape (Xl_approx, [J(2) J(3)]); 

Xl_approx = B * Xl_approx * C ; 
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sqerr = sqerr + norm (Xl_slice-Xl_approx, ' f ro' ) "2 ; 

normX = normX + norm (Xl_slice, ' f ro' ) "2 ; 
end 
fit = 1 - sqrt (sqerr) / sqrt (normX) ; 

fprintf(' loop %d: fit = %f \n' , loop_num, fit); 

% stop if fit is not increasing fast enough 

if ((fit - old_fit) < mint itchange) 

break; 
end 

old_fit = fit; 

end 

fprintf(' total loops = %d\n' , loop_num) ; 

%% save tensor 

output_file = [data_dir, '/', tuck:er_file] ; 
save (output_file, ' G' , ' A' , ' B' , ' C ) ; 

fprintf(' tucker tensor is in %s\n' , tucker_f ile) ; 

fprintf ('MULTISLICE is done\n' ) ; 



function slice (data_dir, spar se_f ile, mode_slice_dir , mode, I ) 
%SLICE chops a tensor into slices along the given mode 

% Peter Turney 

% October 20, 2007 

% Copyright 2007, National Research Council of Canada 

%% initialize 

% set the secondary modes 

if (mode == 1) 

r_mode = 2 ; 

c_mode = 3; 
elseif (mode == 2) 

r_mode = 1; 

c_mode = 3; 
else 

r_mode = 1; 

c_mode = 2 ; 
end 

% get sizes 

Ns = I (mode); % number of slices 

Nr = I(r_mode); % number of rows in each slice 

Nc = I(c_mode); % number of columns in each slice 

%% sort the index 

fprintf (' SLICE is running . . . \n' ) ; 

% file names 

sub_dir = [data_dir, '/', mode_slice_dir] ; 
sorted_file = [sub_dir, '/', 'sorted.txt']; 

% make sure the directories exist 
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if (isdir (data_dir) == 0) 

mkdir (data_dir) ; 
end 
if (isdir (sub_dir) == 0) 

mkdir (sub_dir) ; 
end 

% sort 

sort_index (data_dir , sparse_f ile, mode_slice_dir , mode) ; 

%% count nonzeros in each slice 

fprintf(' counting nonzeros in each slice for mode %d\n' , mode) ; 

% vector for storing nonzero count 

nonzeros = zeros (Ns , 1 ) ; 

% read sorted file in blocks 

% - read in blocks because file may be too big to fit in RAM 

% - textscan will create one cell for each field 

% - each cell will contain a column vector of the values in 

% the given field 

% - the number of elements in each column vector is the number 

% of lines that were read 

desired_lines = 100000; 
actual_lines = desired_lines; 

sorted_f ile_id = f open (sorted_f ile, ' r' ) ; 
while (actual_lines > 0) 

block = textscan {sorted_file_id, ' %d %d %d %*f ' , desired_lines) ; 

mode_subs = block{mode } ; 

actual_lines = size (mode_subs, 1) ; 

for i = 1 : actual_lines 

nonzeros (mode_subs (i) ) = nonzeros (mode_subs ( i) ) + 1; 

end 
end 
f close (sorted_f ile_id) ; 

%% make slices 

fprintf(' saving slices for mode %d\n' , mode) ; 

sorted_f ile_id = f open (sorted_f ile, ' r' ) ; 
for i = l:Ns 

slice_file = sprintf ( ' %s/slice%d.mat ' , sub_dir, i); 

nonz = nonzeros (i) ; 

block = textscan (sorted_file_id, ' %d %d %d %f',nonz); 

slice_rows = double (block { r_mode }) ; 

slice_cols = double (block { c_mode }) ; 

slice_vals = block{4}; 

slice = sparse (slice_rows, slice_cols, slice_vals, Nr, Nc, nonz ) ; 

save (slice_f ile, ' slice' ) ; 
end 
f close (sorted_f ile_id) ; 

fprintf (' SLICE is done\n' ) ; 



function sort_index (data_dir, spar se_f ile, mode_slice_dir , mode) 
%SORT_INDEX sorts a sparse tensor index file along the given mode 

% Peter Turney 

% October 20, 2007 

% Copyright 2007, National Research Council of Canada 
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%% sort the index 

fprintf (' SORT_INDEX is running ...\n'); 

% file names 

input_file = [data_dir, '/', sparse_file] ; 
sub_dir = [data_dir, '/', mode_slice_dir] ; 
sorted_file = [sub_dir, ' /' , 'sorted.txt']; 

% call Unix 'sort' command 

% -n = numerical sorting 

% -k = key to sort on 

% -s = stable sorting 

% -S = memory for sorting buffer 

% -o = output file 

% - the 'sort' command is a standard part of Unix and Linux 

% - if you are running Windows, you can get 'sort' by 

% installing Cygwin 

% - the sort buffer is set here to 1 GiB; you can set it 

% to some other value, based on how much RAM you have 

command = sprintf ( ' sort -n -s -S IG -k %d, %d -o %s %s', ... 
mode, mode, sorted_file, input_file) ; 

fprintf (' calling Unix sort for mode %d\n' , mode); 
Unix (command) ; 

fprintf (' SORT_INDEX is done\n' ) ; 



function slice = load_slice (data_dir , mode_dir, i ) 
%LOAD_SLICE loads a sparse slice file 

% Peter Turney 

% October 20, 2007 

% Copyright 2007, National Research Council of Canada 

% file name 

slice_file = sprintf (' %s/%s/slice%d. mat' , data_dir, mode_dir, i) ; 

'a 

% load the file 
data = load (slice_f lie) ; 
% return the slice 
slice = data. slice; 



function test 

%TEST illustrates how to use multislice.m 

% Peter Turney 

% October 26, 2007 

% Copyright 2007, National Research Council of Canada 

% test multislice.m 

% set random seed for repeatable experiments 
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randC seed' , 5678) ; 

% set parameters 

I = [100 110 120]; % input sparse tensor size 
J = [10 11 12]; % desired core tensor size 

density =0.1; % percent nonzero 

data_dir = 'test'; % directory for storing tensor 

sparse_file = 'spten.txt'; % file for storing raw data tensor 
tucker_file = ' tucker .mat ' ; % file for storing Tucker tensor 

% make a sparse random tensor and store it in a file 

sparse_random_tensor (data_dir, sparse_f ile, I, density) ; 

% call multislice 

tic; 

fit = multislice (data_dir, sparse_f ile, tucker_f ile, I, J) ; 

time = toe; 

% show results 

fprintf (' \n' ) ; 

fprintf ('Multislice : \n' ) ; 

fprintf ('I = [%d %d %d] \n' , 1(1), 1(2), 1(3)); 

fprintfCJ = [%d %d %d] \n' , J(l), J(2), J(3)); 

fprintf (' density = %f \n' , density); 

fprintf ('fit = %f \n' , fit); 

fprintf (' time = %.lf\n', time); 

fprintf (' \n' ) ; 



function sparse_random_tensor (data_dir , spar se_f ile, I , density ) 
%SPARSE_RANDOM_TENSOR makes a sparse uniformly distributed random tensor 

% Peter Turney 

% October 20, 2007 

% Copyright 2007, National Research Council of Canada 

% assume a third-order tensor is desired 

%% initialize 

fprintf ( ' SPARSE_RANDOM_TENSOR is running . . . \n' ) ; 

% make sure the directory exists 

if (isdir (data_dir) == 0) 

mkdir (data_dir) ; 
end 

file_name = [data_dir, '/', sparse_file] ; 

fprintf (' generating tensor of size %d x %d x %d with density %f \n' , 
1(1), 1(2), 1(3), density); 

%% main loop 

file_id = f open ( f ile_name, ' w' ) ; 
fprintf (' slice: '); 
for il = 1:1(1) 

fprintf ('%d ',il); % show progress 

if ((mod(il,10) == 0) && (il ~= 1(1))) 

fprintf ('\n '); % time for new line 

end 

for 12 = 1:1 (2) 
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for 13 = 1:1(3) 

If (rand < density) 

f prlntf (f lle_ld, ' %d %d %d %f \n' , 11, 12, 13, rand) ; 
end 
end 
end 
end 

fprlntf (' \n' ) ; 
fclose (f lle_ld) ; 

fprlntf (' SPARSE_RANDOM_TENSOR Is done\n' ) ; 
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